The Common Pattern Specification Language

نویسندگان

  • Douglas E. Appelt
  • Boyan A. Onyshkevych
چکیده

This paper describes the Common Pattern Specification Language (CPSL) that was developed during the TIPSTER program by a committee of researchers from the TIPSTER research sites. Many information extraction systems work by matching regular expressions over the lexical features of input symbols. CPSL was designed as a language for specifying such finite-state grammars for the purpose of specifying information extraction rules in a relatively system-independent way. The adoption of such a common language would enable the creation of shareable resources for the development of rule-based information extraction systems. 1. THE NEED FOR CPSL As researchers have gained experience with information extraction systems, there has been some convergence of system architecture among those systems based on the knowledge engineering approach of developing sets of rules more or less by hand, targeted toward specific subjects. Some rule-based systems have achieved very high performance on such tasks as name identification. Ideally, developers of information extraction systems should be able to take advantage of the considerable effort that has gone into the development of such highperformance extraction system components. Unfortunately, this is usually impossible, in part because each system has a native formalism for rule specification, and the translation of rules from one native formalism to another is usually a slow, difficult, and error-prone process that ultimately discourages the sharing of system components or rule sets. Over the course of the TIPSTER program and other information extraction efforts, many systems have converged on an architecture based on matching regular expression patterns over the lexical features of words in the input texts. The Common Pattern Specification Language (CPSL) was designed to take advantage of this convergence in architecture by providing a common formalism in which finite-state patterns could be repre23 sented. This would then enable the development of shareable libraries of finite-state patterns directed toward specific extraction tasks, and hopefully remove one of the primary barriers to the fast development of high-performance information extraction systems. Together with common lexicon standards and annotation standards, a developer can exploit previous domain or scenario customization efforts and make use of the insights and the hard work of others in the extraction community. The CPSL was designed by a committee consisting of a number of researchers from the Government and all of the TIPSTER research sites involved in Information Extraction that are represented in this volume. 2. INTERPRETER ASSUMPTIONS A pattern language is intended to be interpreted. Indeed, the interpreter is what gives the syntax of the language its meaning. Therefore, CPSL was designed with a loosely specified reference interpreter in mind. It was realized that extraction systems may not work exactly like the reference interpreter, and it was certainly not the goal of the designers to stifle creativity in system design. However, it was hoped that any system that implemented at least the functionality of the reference interpreter would, given appropriate lexicons, be able to used published sharable resources. Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. 1. REPORT DATE OCT 1998 2. REPORT TYPE 3. DATES COVERED 00-00-1998 to 00-00-1998 4. TITLE AND SUBTITLE The Common Pattern Specification Language 5a. CONTRACT NUMBER

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Pattern Language for Communication Protocols

In this paper, we suggest a pattern language, a collection of related design patterns, for the development of communication protocols with an emphasis on an SDL (Specification and Description Language) implementation. The patterns are grouped in two categories: structural patterns and behavioral patterns. The structural patterns are focused on the architectural aspects of communication protocol...

متن کامل

Pattern Language for Specification of Communication Protocols

This paper presents the pattern language for specification of communication protocols. The pattern language contains four levels which are used to specify communication protocol and its messages. These four levels of this pattern language are: high-level protocol specification, protocol structure specification, message specification, and detailed message specification. Several existing patterns...

متن کامل

The Sri Tipster Iii Project

One step towards ease-of-use by nonexperts was the development reported in Phase II [1] of SRI's FastSpec language which enabled greater facility in generating and modifying the syntactic and semantic patterns necessary for identifying pertinent data. This was a motivating factor for the establishment of the Common Pattern Specification Language (CPSL) Working Group devoted to formulating a CPS...

متن کامل

From Problems to Programs: A Pattern Language to Go from Problem Requirements to Solution Schemas in Elementary Programming

The From Problems to Programs pattern language is an intent to formalize to a certain extent, via patterns, the programming process to go from the problem to the program. It outlines a common instructional framework to categorize, specify and solve the problems commonly taught in an introductory programming course. The language helps those educators interested on an organization of information ...

متن کامل

Generating a Pattern Matching Compiler by Partial Evaluation

Partial evaluation can be used for automatic generation of compilers and was first implemented in [10]. Since partial evaluation was extended to higher order functional languages [9] [2] it has become possible to write denotational semantics definitions of languages and implement these with very few changes in the language treated by partial evaluators. In this paper we use this technique to ge...

متن کامل

Measuring Test Properties Coverage for Evaluating UML/OCL Model-Based Tests

We propose in the paper a test property specification language, dedicated to UML/OCL models. This language is intended to express temporal properties on the executions of the system, that one wants to test. It is based on patterns, specifying the behaviours one wants to exhibit/avoid, and scopes, defining the piece of execution trace on which a given pattern applies. Each property is a combinat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998